Word Sense Discovery and Disambiguation
نویسندگان
چکیده
The work is based on the assumption that words with similar syntactic usage have similar meaning, which was proposed by Zellig S. Harris (1954,1968). We study his assumption from two aspects: firstly, different meanings (word senses) of a word should manifest themselves in different usages (contexts), and secondly, similar usages (contexts) should lead to similar meanings (word senses). If we start with the different meanings of a word, we should be able to find distinct contexts for the meanings in text corpora. We separate the meanings by grouping and labeling contexts in an unsupervised or weakly supervised manner (Publication 1, 2 and 3). We are confronted with the question of how best to represent contexts in order to induce effective classifiers of contexts, because differences in context are the only means we have to separate word senses. If we start with words in similar contexts, we should be able to discover similarities in meaning. We can do this monolingually or multilingually. In the monolingual material, we find synonyms and other related words in an unsupervised way (Publication 4). In the multilingual material, we find translations by supervised learning of transliterations (Publication 5). In both the monolingual and multilingual case, we first discover words with similar contexts, i.e., synonym or translation lists. In the monolingual case we also aim at finding structure in the lists by discovering groups of similar words, e.g., synonym sets. In this introduction to the publications of the thesis, we consider the larger background issues of how meaning arises, how it is quantized into word senses, and how it is modeled. We also consider how to define, collect and represent contexts. We discuss how to evaluate the trained context classifiers and discovered word sense classifications, and finally we present the word sense discovery and disambiguation methods of the publications. This work supports Harris’ hypothesis by implementing three new methods modeled on his hypothesis. The methods have practical consequences for creating thesauruses and translation dictionaries, e.g., for information retrieval and machine translation purposes.
منابع مشابه
رفع ابهام معنایی واژگان مبهم فارسی با مدل موضوعی LDA
Word sense disambiguation is the task of identifying the correct sense for the word in a given context among a finite set of possible sense. In this paper a model for farsi word sense disambiguation is presented. The model use two group of features: first, all word and stop words around target word and topic models as second features. We extract topics from a farsi corpus with Latent Dirichlet ...
متن کاملAn Intelligent Hybrid Approach for Improving Recall in Electronic Discovery
In this work, we propose a hybrid method for improving recall in electronic discovery proceedings. This approach takes ideas from Natural Language Processing (Word sense disambiguation) and Information Retrieval in enhancing retrieval of responsive documents using the semantics of query terms instead of direct text matching. Preliminary results from disambiguation of user queries show that this...
متن کاملAn Improvement Method for The Ambiguous Fragments Discovery in Chinese Word Segmentation
Disambiguation is a difficult task in Chinese Automatic Word Segmentation, and the ambiguous fragments discovery is the foundation of the disambiguation. This article proposes a method named Bidirectional Maximum Matching and Retroversion Multiword to discover the ambiguous fragments, which can deal with the overlapping ambiguity fragments of the long precision. Some experiments show that this ...
متن کاملA Large Margin Approach to Anaphora Resolution for Neuroscience Knowledge Discovery
A discriminative large margin classifier based approach to anaphora resolution for neuroscience abstracts is presented. The system employs both syntactic and semantic features. A support vector machine based word sense disambiguation method combining evidence from three methods, that use WordNet and Wikipedia, is also introduced and used for semantic features. The support vector machine anaphor...
متن کاملUnsupervised Word Sense Induction from Multiple Semantic Spaces with Locality Sensitive Hashing
Word Sense Disambiguation is the task dedicated to the problem of finding out the sense of a word in context, from all of its many possible senses. Solving this problem requires to know the set of possible senses for a given word, which can be acquired from human knowledge, or from automatic discovery, called Word Sense Induction. In this article, we adapt two existing meta-methods of Word Sens...
متن کاملWord Domain Disambiguation via Word Sense Disambiguation
Word subject domains have been widely used to improve the performance of word sense disambiguation algorithms. However, comparatively little effort has been devoted so far to the disambiguation of word subject domains. The few existing approaches have focused on the development of algorithms specific to word domain disambiguation. In this paper we explore an alternative approach where word doma...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005